fix: restore override parameters to workflow template #34

wietzesuijker · 2025-10-30T10:54:33Z

Problem

The item at https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a-dp-test/items/S2A_MSIL2A_20251023T105131_N0511_R051_T31UET_20251023T122522 doesn't preview data properly.

Root cause: The workflow template was not passing override parameters to convert.py, so conversions were using incorrect defaults instead of the collection-specific configs.

Changes

Restore passing of override_groups, override_spatial_chunk, override_tile_width, override_enable_sharding to convert.py
Fix --enable-sharding arg to accept string values instead of boolean flag (allows empty string fallback)
When overrides are empty strings, convert.py uses collection defaults from CONFIGS

Testing

Will test with workflow run after Docker image builds.

Add complete Argo Workflows infrastructure for geozarr pipeline with automated AMQP event triggering. Workflow pipeline: - Convert: Sentinel-2 Zarr → GeoZarr (cloud-optimized) - Register: Create STAC item with metadata - Augment: Add visualization links (XYZ tiles, TileJSON) Event-driven automation: - AMQP EventSource subscribes to RabbitMQ queue - Sensor triggers workflows on incoming messages - RBAC configuration for secure execution Configuration: - Python dependencies (pyproject.toml, uv.lock) - Pre-commit hooks (ruff, mypy, yaml validation) - TTL cleanup (24h auto-delete completed workflows)

Add STAC registration, augmentation, and workflow submission scripts. - register_stac.py: Create/update STAC items with S3→HTTPS rewriting - augment_stac_item.py: Add visualization links (XYZ tiles, TileJSON) - submit_via_api.py: Submit workflows via Argo API for testing - Retry with exponential backoff on transient failures - Configurable timeouts via HTTP_TIMEOUT, RETRY_ATTEMPTS, RETRY_MAX_WAIT - Workflow step timeouts: 1h convert, 5min register/augment

Add operator notebooks and environment configuration. - Submit workflow examples (AMQP and direct API) - Environment variable template (.env.example) - .gitignore for Python, IDEs, Kubernetes configs

Add container build configuration and development tooling. - Dockerfile for data-pipeline image - Makefile for common tasks (build, push, test) - GitHub Container Registry integration

Add comprehensive testing and project documentation. - Unit tests for register_stac and augment_stac_item - Integration tests for workflow submission - E2E test configuration - Project README, CONTRIBUTING, QUICKSTART guides - CI workflow (GitHub Actions)

Extend pipeline to support Sentinel-1 GRD collections: - S1 GRD workflow configuration and test payloads - Collection detection logic (get_crs.py extended for S1) - Staging namespace deployment (rbac-staging.yaml) - S1-specific STAC registration handling - End-to-end S1 test suite - v20-v22 image iterations with S1 support Enables multi-mission pipeline supporting both S2 L2A and S1 GRD products.

Add comprehensive S1 GRD pipeline documentation and example code. docs/s1-guide.md: - S2 vs S1 feature comparison (groups, flags, chunks, polarizations) - Collection registry config for sentinel-1-l1-grd - Preview generation logic (grayscale with polarization detection) - Test data sources (EODC STAC) - Workflow parameters for S1 conversion - Known issues (GCP reprojection, memory, TiTiler rescaling) examples/s1_quickstart.py: - End-to-end S1 pipeline: fetch → convert → register → augment - Demonstrates S1-specific flags: --gcp-group, --spatial-chunk 2048 - Example using EODC S1C_IW_GRDH test item - Local development workflow Usage: python examples/s1_quickstart.py

Generalize pipeline through collection registry pattern: - Collection-specific parameter registry (groups, chunks, tile sizes) - Dynamic parameter lookup script (get_conversion_params.py) - Registry integration across all workflow stages - Support for S2 L2A and S1 GRD with distinct parameters - Kustomize-based deployment structure Enables scalable addition of new missions (S3, S5P, etc.) through registry configuration without code changes.

Add comprehensive performance measurement and validation: - Automated validation workflow task (validate_geozarr.py) - Performance benchmarking tools (benchmark_comparison.py, benchmark_tile_performance.py) - Production metrics from 9 operational workflows (8.6min avg, 75% success) - Ecosystem compatibility validation (zarr-python, xarray, stac-geoparquet) - User guide for adding new collections (docs/ADDING_COLLECTIONS.md) - Performance report with operational metrics (docs/PERFORMANCE_REPORT.md) Production validation shows pipeline ready for deployment with validated performance and ecosystem compatibility.

Enable parallel chunk processing with Dask distributed: - Add --dask-cluster flag to conversion workflow - Update to v26 image with Dask support - Add validation task between convert and register stages Initial test shows 1.6× speedup (320s vs 516s baseline).

Task was defined but never referenced in DAG (lines 25-37).

Add workflow parameters: - stac_api_url, raster_api_url (API endpoints) - s3_endpoint, s3_output_bucket, s3_output_prefix (S3 config) Replace all hardcoded values with parameter references for: - STAC/raster API URLs in register/augment tasks - S3 endpoint in all tasks - S3 bucket/prefix in convert/validate/register tasks Enables easy environment switching (dev/staging/prod) via parameter override.

Three Jupyter notebooks demonstrating GeoZarr data access and pyramid features: 01_quickstart.ipynb - Load GeoZarr from S3 with embedded STAC metadata - Visualize RGB composites - Inspect geospatial properties 02_pyramid_performance.ipynb - Benchmark tile serving with/without pyramids - Measure observed 3-5× speedup at zoom 6-10 - Calculate storage tradeoffs (33% overhead) 03_multi_resolution.ipynb - Access individual pyramid levels (0-3) - Compare sizes (4.7MB → 72KB reduction) - Explore quality vs size tradeoffs These notebooks help users understand the pipeline outputs and evaluate pyramid benefits for their use cases. Still evolving as we refine the conversion process and gather production feedback.

Replace inline bash script in workflows/amqp-publish-once.yaml with scripts/publish_amqp.py. Script is now included in Docker image, eliminating need for runtime pip installs and curl downloads. Changes: - Add scripts/publish_amqp.py with routing key templates and retry - Update workflows/amqp-publish-once.yaml to use pre-built image - Add workflows/ directory to docker/Dockerfile - Add tests/unit/test_publish_amqp.py with pytest-mock

20 tests: pattern matching, S1/S2 configs, CLI output formats

Tests asset priority logic (product > zarr > any .zarr) and error handling for missing or malformed STAC items.

Tests subprocess execution, timeout handling, error cases, and CLI options including file output and verbose mode.

Measures load time and dataset metrics for performance comparison. Outputs JSON results with speedup factor and format recommendations.

- Add show-parameters step displaying full workflow config in UI - Add step headers (1/4, 2/4, etc) to all pipeline stages - Add progress indicators and section dividers for better readability - Add workflow metadata labels (collection, item-id) for filtering - Fix sensor event binding (rabbitmq-geozarr/geozarr-events) - Add S1 E2E test job (amqp-publish-s1-e2e.yaml) Argo UI now shows: • Full payload/parameters in dedicated initial step • Clear step numbers and progress for each stage • Final URLs for STAC item and S3 output • Better context during long-running conversions

Complete validation report showing: - Successful S1 GRD to GeoZarr conversion - 21-minute workflow execution (30k x 15k resolution) - 6-level multiscale pyramids for VV/VH polarizations - STAC registration with preview links - UI enhancements validated in Argo - Collection registry parameters documented

- Fix sys.path in test_publish_amqp.py from parent.parent to parent.parent.parent - Update S1 spatial_chunk test expectations from 2048 to 4096 - Aligns with code changes in get_conversion_params.py

- Remove test_real_stac_api_connection (only checked HTTP 200, no logic) - Remove unused os import - Test had external dependency, was flaky, redundant with mocked tests

- Format long argparse description lines for readability - No functional changes, purely formatting

- Set archiveLogs: false for immediate log visibility via kubectl - Change convert-geozarr from script to container template for stdout logs - Reduce memory request to 6Gi (limit 10Gi) for better cluster scheduling - Add Dask parallel processing info in comments - Simplify show-parameters to basic output Fixes 30-60s log delay in Argo UI. Logs now visible via kubectl immediately.

- Add run-s1-test.yaml for direct kubectl submission - Update amqp-publish-s1-e2e.yaml with optimized test parameters - Use S1A item from Oct 3 for consistent testing

- Add WORKFLOW_SUBMISSION_TESTING.md with complete test results - Update README.md: reorganize by recommendation priority - Document all 4 submission methods with pros/cons - Add troubleshooting for log visibility and resource limits - Simplify Quick Start to 2 commands (30 seconds) - Document Dask integration and resource optimization Covers kubectl, Jupyter, event-driven (AMQP), and Python CLI approaches.

Test validation proven by 93 passing tests, not narrative docs

- Configure pytest pythonpath to enable script imports (unblocks 90 tests) - Add exception tracebacks to get_conversion_params error handlers - Add error trap to validate-setup.sh for line-level diagnostics - Replace timestamp-based Docker cache with commit SHA for precision - Add pre-commit hooks (ruff, mypy) for code quality enforcement Test results: 90/90 passing, 32% coverage

- Add integration-tests job in GitHub Actions (runs on PRs only) - Add explicit resource requests/limits to all workflow templates - convert-geozarr: 6Gi/10Gi memory, 2/4 CPU - validate: 2Gi/4Gi memory, 1/2 CPU - register-stac: 1Gi/2Gi memory, 500m/1 CPU - augment-stac: 1Gi/2Gi memory, 500m/1 CPU Prevents pod eviction and enables predictable scheduling

- Fix all failing unit tests for refactored code - Add comprehensive tests for create_geozarr_item.py - Add test coverage for metrics module - Update test fixtures for new script structure - Achieve 99% coverage with clean output - Move test utilities to tools/testing/

- Enable auto-build on all branches for rapid iteration - Add validation dependencies to pyproject.toml - Update uv.lock with latest dependencies

- Restructure README with clear Quick Start - Add inline kubectl YAML examples - Organize Usage into 3 methods (kubectl, AMQP, Jupyter) - Add Workflow Steps section explaining pipeline - Improve Configuration with subsections - Enhance Troubleshooting with actionable commands - Update CONTRIBUTING.md and GETTING_STARTED.md - Update Makefile and examples/

- Remove validation step from pipeline (convert → register) - Delete validate_geozarr.py script - Remove tools/, examples/, tests/, docs/ directories - Remove duplicate workflow YAMLs (rbac, sensor, eventsource at root) - Consolidate markdown files to 3 (README, workflows/README, notebooks/README) - Reduce to 6 core scripts (create, register, augment, params, utils, metrics) - Update Makefile (remove test/test-cov/publish/deploy targets) - Update pyproject.toml description for minimal pipeline - Update workflows/base/workflowtemplate.yaml to 2-step DAG - Update documentation for engineers familiar with Argo/K8s/STAC

- Create convert.py and register.py entry points - Chain function calls in Python instead of bash - Eliminate shell variable passing and multiple process spawns - Preserve individual script CLI interfaces for standalone use - Cleaner error handling with Python exceptions vs bash exit codes

The slim branch has no tests directory, so integration tests should be skipped like the main test job.

- Remove matrix strategy (not publishing a package, single Python version is sufficient) - Remove integration-tests job (no integration tests exist) - Remove hashFiles conditions (unnecessary complexity) - Use Python 3.11 consistently (matches Docker image)

- Update requires-python to >=3.13 - Update Dockerfile base image to python:3.13-slim - Update ruff target-version to py313 - Remove 3.11/3.12 classifiers, keep only 3.13

- Remove metrics.py - Remove prometheus-client dependency - Remove --enable-metrics flag from register.py - Remove metrics imports and calls from register_stac.py - Remove metrics import/usage from augment_stac_item.py - Remove --enable-metrics from workflow template Prometheus metrics can be added with a separate PR using the feat/prometheus-metrics-integration branch.

Keep slim branch focused on core pipeline (7 scripts, 1 job). Notebooks can be re-added on separate feature branch with: - Proper dependencies in pyproject.toml - Plug-and-play setup (uv sync --extra notebooks)

- Remove unused environment variable override system - Remove pattern matching complexity (only 2 missions) - Simplify to direct prefix lookup (sentinel-1, sentinel-2) - 157 → 100 lines (36% reduction) - Same functionality, clearer code Formats still work: --format json (JSON output) --format shell (shell variables) --param groups (single param)

Register step now passes --s3-output-bucket and --s3-output-prefix instead of pre-constructed --geozarr-url. Construction happens in register.py using item_id extracted from source_url. Workflow YAML: 130 → 111 lines (no inline Python) register.py: bucket/prefix args, constructs s3://{bucket}/{prefix}/{collection}/{item_id}.zarr

- extract_item_id: replaced with urlparse().path.split()[-1] - get_zarr_url: moved into convert.py

Remove workflow_dispatch and tags triggers from build workflow Remove pull_request and workflow_dispatch triggers from test workflow Fix permissions in test workflow (no write access needed for tests)

… artifacts - Add S3 cleanup before conversion to remove stale base arrays - Revert to Python entry points (convert.py, register.py) for maintainability - Fix groups parameter type (string → list) for API compatibility - Use clean args approach instead of inline bash scripts - Fix TiTiler preview path to use overview arrays (/r10m/0:tci) This addresses PR feedback by consolidating the cleanup fix with proper Python-based workflow structure. All debugging iterations squashed.

The --crs-groups flag triggers prepare_dataset_with_crs_info() in data-model, which writes CRS metadata via ds.rio.write_crs() and creates the spatial_ref coordinate variable required by TiTiler validation. Restores working configuration from commit 21ea009.

… permissions

- workflows/README: explain secret purposes (event ingestion, storage, API auth) - workflows/README: add direct OVH Manager links for kubeconfig and S3 credentials - README: delegate setup to workflows/README - Separate operator usage (root README) from deployment setup (workflows/README)

wietzesuijker added 30 commits October 7, 2025 22:32

docs: examples and configuration

2019ffe

Add operator notebooks and environment configuration. - Submit workflow examples (AMQP and direct API) - Environment variable template (.env.example) - .gitignore for Python, IDEs, Kubernetes configs

build: Docker images and Makefile

d845523

Add container build configuration and development tooling. - Dockerfile for data-pipeline image - Makefile for common tasks (build, push, test) - GitHub Container Registry integration

build: add workflow_dispatch trigger to enable manual test runs

3d4350e

refactor: remove unused benchmark task from workflow template

8c9a55c

Task was defined but never referenced in DAG (lines 25-37).

test: add 91% coverage for get_conversion_params

af686fc

20 tests: pattern matching, S1/S2 configs, CLI output formats

test: add unit tests for STAC Zarr URL extraction

d0d60db

Tests asset priority logic (product > zarr > any .zarr) and error handling for missing or malformed STAC items.

test: add unit tests for GeoZarr validation script

1c0b5d4

Tests subprocess execution, timeout handling, error cases, and CLI options including file output and verbose mode.

feat: add automated EOPF vs GeoZarr benchmark script

b11fa2a

Measures load time and dataset metrics for performance comparison. Outputs JSON results with speedup factor and format recommendations.

fix(test): correct import path and S1 chunk test assertions

63ad98e

- Fix sys.path in test_publish_amqp.py from parent.parent to parent.parent.parent - Update S1 spatial_chunk test expectations from 2048 to 4096 - Aligns with code changes in get_conversion_params.py

test: remove low-value STAC API connectivity test

4c474e9

- Remove test_real_stac_api_connection (only checked HTTP 200, no logic) - Remove unused os import - Test had external dependency, was flaky, redundant with mocked tests

refactor: apply code formatting (line length)

d1610df

- Format long argparse description lines for readability - No functional changes, purely formatting

feat(workflow): add S1 GRD test workflow examples

620c5ff

- Add run-s1-test.yaml for direct kubectl submission - Update amqp-publish-s1-e2e.yaml with optimized test parameters - Use S1A item from Oct 3 for consistent testing

docs: remove stale test results doc

a34cdb8

Test validation proven by 93 passing tests, not narrative docs

wietzesuijker and others added 24 commits October 22, 2025 18:31

build: update CI/CD and dependencies

174413b

- Enable auto-build on all branches for rapid iteration - Add validation dependencies to pyproject.toml - Update uv.lock with latest dependencies

refactor: generalize augment script for multi-mission support

b271902

ci: remove unconfigured codecov upload

159be2a

ci: use hashFiles to conditionally run tests

0b4d9a9

The slim branch has no tests directory, so integration tests should be skipped like the main test job.

ci: simplify test workflow

6094a14

- Remove matrix strategy (not publishing a package, single Python version is sufficient) - Remove integration-tests job (no integration tests exist) - Remove hashFiles conditions (unnecessary complexity) - Use Python 3.11 consistently (matches Docker image)

build: upgrade to Python 3.13

7e2e65d

- Update requires-python to >=3.13 - Update Dockerfile base image to python:3.13-slim - Update ruff target-version to py313 - Remove 3.11/3.12 classifiers, keep only 3.13

refactor: remove notebooks for slim PR

c0ed129

Keep slim branch focused on core pipeline (7 scripts, 1 job). Notebooks can be re-added on separate feature branch with: - Proper dependencies in pyproject.toml - Plug-and-play setup (uv sync --extra notebooks)

refactor: inline utils.py

8198ca6

- extract_item_id: replaced with urlparse().path.split()[-1] - get_zarr_url: moved into convert.py

ci: simplify GitHub Actions triggers

3da5667

Remove workflow_dispatch and tags triggers from build workflow Remove pull_request and workflow_dispatch triggers from test workflow Fix permissions in test workflow (no write access needed for tests)

chore: sync uv.lock, remove unused requests dependency and fix script…

4b0f232

… permissions

chore: log item_url

8b355c6

docs: exemplify kubectl output

2149ffd

docs: remove duplicate Deploy section and clarify test status

b226d75

fix: update sentinel-2 conversion parameters for accuracy

7eebfa1

wietzesuijker force-pushed the fix/restore-conversion-override-params branch 2 times, most recently from a6fa81e to 247ca0d Compare October 30, 2025 12:36

fix(workflow): remove --verbose and use pr-34 image

7e07d0c

wietzesuijker force-pushed the fix/restore-conversion-override-params branch from 0284a2b to 7e07d0c Compare October 30, 2025 15:14

fix: remove quicklook group (not in EODC source)

a431f7f

wietzesuijker force-pushed the fix/restore-conversion-override-params branch from 7471869 to 7e07d0c Compare October 31, 2025 05:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: restore override parameters to workflow template #34

fix: restore override parameters to workflow template #34

Uh oh!

wietzesuijker commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: restore override parameters to workflow template #34

Are you sure you want to change the base?

fix: restore override parameters to workflow template #34

Uh oh!

Conversation

wietzesuijker commented Oct 30, 2025

Problem

Changes

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants